﻿%METHOD WARS REF - the idea being that the method wars were a period of experimentation with the component diagrams of UML, after this period of experimentation, the main protagonists came together first in Rational and then in IBM, to produce what has become UML. The method wars period, strengthened UML.
% it is a good and healthy evolution in modelling


%http://www.umlforum.com/docs/papers/Booch-CACM-Oct99-UML2.pdf


%This chapter talks about the importance of integrating EHR with terminology, reflection, experience gained etc
 
% REVIEW ALL SECTION NUMBERS!!!!
 
\chapter{Discussion and conclusions}


In previous chapters, the author has explored and surveyed the current state of the art with respect to clinical information models and the meta data that express the semantics of clinical meanings in an Electronic Health Record. The author had also investigated SNOMED-CT, one of the largest terminological resources that has been designed to be used with EHRs. 


The detailed work of this thesis, dealing with different aspects of terminological shadows as a ``bridge'' between clinical information models and clinical terminology, is now fully revealed to the reader. This chapter aims to conclude the study by revisiting the research objectives in light of the work reported in this document and discussing the contributions of the work that have been described in the earlier chapters. This chapter will also provide reflections on the author's experiences of creating a framework for the integration of EHR meta data and terminologies. 


%The sections in this chapter encompass the contribution to understanding in this research area, the experience gained for producing an approach to integrating EHR meta data with terminologies, discussion points and considered opinions that have not been included in previous chapters and the plan to expand and improve the work in the future.
 
\section{Background to the contributions} 


This thesis has so far reported the work that has been done to facilitate the better integration between EHR information models and clinical terminologies. A number of unique achievements have arisen from this study that will benefit research communities that develop and utilise two-level EHR and clinical terminologies. This section describes the context of these contributions. 


As indicated at various points in chapter 2, a number of research projects and healthcare
organisations have shown an interest in adopting archetypes. The development of archetypes 
as meta-data for a two level model EHR relies on a community-based approach. As a result archetype
designers are creating their own versions of what should be communicated between EHR systems and
archetypes are proliferating. As reported in section~\ref{sec:repo}, the NHS, openEHR and NEHTA
archetype repositories each have over half of their archetypes differing from each other. Management of slightly different designs for archetypes based on similar clinical concepts becomes problematic because of the slow process of archetype evaluation cycle, manual inspection is needed to judge whether the semantic meanings are close in two archetypes. As anyone who has tried it will observe, the manual task of SNOMED-CT code annotation and insertion in archetypes is an ambiguous and therefore iterative and time-consuming one. Automatic comparison can help to reduce the work load by identifying and highlighting the similarities. Experts can use their expertise at a deeper level.
 
The terminological shadow approach presented in this work, particularly the automatic SNOMED-CT
binding mechanism, is not intended to replace the effort of terminology experts. Nor will it replace
the manual classification and categorisation of archetypes by experts. Rather, assuming that the
speed of manual annotation and categorisation is slower than the growth in the number of archetypes
since the review process is usually behind the design process \parencite{alberto2011mie}, automatic categorisation can be helpful to support the manual annotation process. This has several benefits as mentioned in section~\ref{sec:paper2-motivation}.
%For example, a document index can be used to support searching and finding desired documents from a document collection. In the same way, archetypes need to be properly indexed since a large or general archetype may contain a large number of clinical concepts. Otherwise one has to first locate the  “closest” archetype to the topic, then observe each node in it to find the required concept. A simple string matching approach will not always satisfy this need since for example medical concepts with synonyms would not then be taken into account. 
This work aims to reduce the manual effort associated with searching for clinical concepts in archetypes, while also providing the experts with material that they are familiar with: concepts from SNOMED-CT.


This document has noted the slow collision between EHR information models and terminologies. Clearly, there are difficult challenges associated with this merging of paradigms, including the process of manual binding and supporting management of semantic clinical content in the presence of overlapping definitions in multiple archetype repositories. The next section will present the author's contributions in this work towards the enhancement of an EHR which is  fully-integrated with clinical terminologies.


\section{Overview of contribution}
  
 
This thesis has set out to investigate how terminological resources and information models could be used together in a highly collaborative and distributed health community. 
A number of peer-reviewed papers \parencite{syu2010, syu5999029, syu2012} have been published and the resources that have been created are made available on the EHRland project website\footnote{http://www.ehrland.ie/down\_demo.html}.

The work that has been reported in this document has provided a discussion of how technological advances in medical science and biology have led to a collaborative healthcare environment. A discussion of associated technological advances can be found in section~\ref{sec:tech-advances}.


The author has argued in section~\ref{sec:the-problem} that a significant requirement for delivering high standard healthcare is to have a high quality patient-oriented health information IT solution and this requires a seamless combination of information models and terminology working together.
It has been noted that the need for and the benefits of the integrated-care electronic health record have been described in the ISO document ``Health informatics -- Electronic health record -- Definition, scope and context''. Section~\ref{sec:info-exchange} of chapter 2 elaborated on this theme and described how changes in healthcare delivery towards shared care has resulted in increased demand for deeper integration of healthcare information systems (including the binding of clinical meaning to shared information) so that health professionals and clinical researchers can all access and understand health related data seamlessly. With a well integrated EHR, shared care delivered by distributed teams of multi-disciplinary healthcare experts is supported by a coherent patient health record, while medical scientists can do research on high quality and therefore trusted anonymised patient data on a large scale.




Given the benefits of the integrated care EHR,  it is clear that an essential task of successfully providing a patient centred view of health information is to integrate the various clinical systems that have different roles and functions in the collaborative healthcare space. This integration is based on the coherent use of models and clinical terminology.


In Chapter 2, the author described and presented statistics for six different archetype repositories
and the number of archetypes and repositories continue to grow.  As the popularity of archetype
modelling approach grows, multiple archetype repositories such as those described in
section~\ref{sec:repo}, that focus on various different clinical use cases, have emerged. The re-use
of existing archetypes and management of archetype modelling becomes crucial for archetype
developers.  Also, as the number of archetypes in a single repository rises, so does the possibility
of overlapping content. Unless strong safeguards are in place, archetypes with overlapping clinical
content and slight differences start appearing in different repositories. These potential problems
when new archetypes are created may be to with the following causes:
% tease out the problems with re-using archetypes, writing new archetypes
\begin{enumerate}
\item archetypes for clinical research would have different emphasis and different actors to those for cancer-related population health reporting or those for cancer care, 
\item differing opinions about data elements, data types, use of coding, identifiers and timestamps. 
\item Ignoring the differences between reference models such as EN13606 vs openEHR will lead to problems. 
\end{enumerate}


% ADD GENERALISED case, other place with models and terminology can also be applied.
% add description to refer EAV (EAV/CR)
Based on the experience with existing repositories, in the author's opinion we are in a period of experimentation with respect to archetype development. In this context, it is desirable to allow developers of archetypes to acknowledge and manage the relatedness of overlapping content in different archetypes either within the same repository or between different repositories. Under this theme the author has carried out work to contribute to solving these problems.  
This thesis has identified an important area to enhance the process of producing a semantic
healthcare record that aims to integrate every aspect of health information. This work is dedicated to the exploration of the integration of two key components: information models of electronic health records and clinical terminologies. Contributions to this area will impact the constant development of meta-data of electronic health records and large clinical terminologies such as SNOMED-CT, which the higher level semantic part of a healthcare record is comprised of. At a higher level the primary and general contribution of this work is \textbf{the introduction and implementation of a novel mediating resource called the \emph{Terminological Shadow} and a framework that facilitates a detailed comparison of elements of information models and with corresponding parts of a terminological system}.


The framework designed by the author has been demonstrated through the instantiation of
terminological shadows and the execution of a number of extended studies with a specific focus on
electronic healthcare records. Shadows are relatively direct representations of the binding between EHR information model meta data and terminological concepts. The terminological shadow approach was also applied to calculate the coverage of SNOMED-CT concepts by existing archetypes in the NHS repository. 
A second application of the shadow approach has also been reported in the thesis - using terminological shadows to measure the semantic relatedness of archetypes by comparing their terminological shadows. 
The detail of each contribution will be discussed in section~\ref{sec:contrib-detail}. 


\section{Details of the contributions}
\label{sec:contrib-detail}


subsection{The shadow approach is not a terminology service}
 
This work differs from the maturing terminology service technology mentioned in section~\ref{sec:term-service} in that the intention of such a service is to provide a unified interface for accessing and utilising
terminology resources. The implementation of a common terminology service will ease the development of client applications to access multiple terminology resources. The focus of this study, however, is to discover the inner links between the EHR information model and the terminology model. It will benefit and enable users to use terminology resources more appropriately; thus to promote semantic interoperability.
 
\subsection{What is new}


In contrast to projects mentioned in section~\ref{sec:terminfo} such as TermInfo, which investigated how to use SNOMED-CT with HL7 messages, the contribution of this project is to reveal the general relationship between the health information models such as archetypes and clinical terminologies. It creates an intermediate representation called a 'terminological shadow' which in this case takes advantages and strengths from both the Archetype Object Model and SNOMED-CT to aid the development of a semantic environment for EHRs.


What is new in this approach is that context information from both archetypes and from the structure of SNOMED-CT have been taken into account to be used as meta data. To provide an analogy, this is similar to what the Semantic Web does to normal documents for web resources. An ontology is used to formally represent the knowledge and relationships between documents. With a much narrower scope than the Semantic Web, the approach taken in this thesis uses SNOMED-CT as a lexical resource and ontology to help aid manual tasks (such as creating, maintaining archetypes etc). By using the intermediate format (it contains both the structural information from archetypes and the possible corresponding concepts from SNOMED-CT') or terminological shadows, one can discover the coverage of existing archetypes; comparing archetypes to find similarity; using shadows as a query tool to find relevant archetypes. Without the shadow approach, these tasks are not automated in the current environment. \textbf{In current existing platforms, typically these tasks are handled by:}


\begin{itemize} 
\item Manual classification and categorisation of archetypes (this task can be daunting with large number of archetypes); 
\item Manual observation of archetypes if comparison is needed (possibly has to go through a full cycle of archetype reviewing); 
\item Third party tools to search and query archetypes (string matching technique) 
\end{itemize}


\textbf{With terminological shadows one can:} \begin{itemize} 
\item Automatically categorise archetypes and track coverage with respect to SNOMED-CT; \item Compare archetypes (the semantic similarity is loosely based on the concept model in SNOMED-CT); 
\item execute queries on archetypes which take into account the context information from archetypes, and from the Reference Model information \end{itemize}


\subsection{Discovery of the semantic gap}


In the area of health informatics, the gaps between EHR information models and medical terminology systems have not been a major focus of research during the early evolution of electronic health records. However as numerous studies have pointed out and attempted to address, a number of semantic interoperability issues are explicitly or implicitly related to differences between the EHR information model and clinical terminology, see section~\ref{sec:term-equal}. These differences, also referred to as semantic gaps, are often manifested in certain incompatibilities that arise when it comes to incorporating terminology with EHRs. 
%One example issue is that despite human experts' speciality in clinical knowledge, they may find it difficult to associate appropriate concepts in a clinical terminology to an existing EHR information model. This is because collisions of similar concepts exist in both systems and the usages differ in syntax and structure. One simple solution is to include certain concepts that are frequently used in EHR information models in a master terminology. 
SNOMED-CT as a large terminology encompasses concepts in EHR information models; for example certain
SNOMED-CT concepts describe specific data elements in a record structure, see the first paragraph in
chapter~\ref{sec:sno-ehrcontext}. 
While it is generally considered that the incompatibility is a result of independent development of both systems, this study has answered the requirement for a generic study in this field.


As previously noted, various projects have been carried out to attempt to address some of these semantic interoperability issues, e.g the TermInfo project produced integration guidelines for integrating HL7 and SNOMED-CT. However the outcome was HL7 specific. Based on the experience gained during this work, the author believes this work will influence the potential future work in the research area to investigate the bond between EHR models and terminologies, especially in the emerging two-level EHR models such as EN13606 and openEHR, where the number of archetypes grows significantly in contrast to terminology bindings.






% note: to provide background for this section, describe each terminology ICD, LOINC and SNOMED-CT in chapter two(?) also mention terminology from another domain.
% text in this section should also be consistent in ch 1,2


A distinctive feature of the literature survey of this thesis is that it provides a state of the art
review in technology to discover this semantic gap between EHR information models and medical
terminology systems. This focus on the semantic gap leads to contribution \textbf{Contribution C.1}.
 \begin{quote}
\textsf{Contribution C.1 This work has explored the state of art in technologies and approaches for using
clinical terminologies to enhance the effectiveness of contemporary clinical information models for
an integrated care EHR.}
 \end{quote}




\subsection{Initial investigation}
Section~\ref{sec:related-work} in chapter 3 reviewed the contemporary approaches and projects that are related to the integration of EHR information models and clinical terminologies.
The author also investigated the archetype-terminology binding mechanism specified by the Archetype Object Model to link archetypes to SNOMED-CT. The investigation formed the foundation of the conceptual idea of a ``Terminological Shadow''. The results of the investigation show that:

\begin{enumerate}[(a)]
  \item The current state of the art binding method involves the manual creation of the binding during or post
archetype-creation process by a clinical expert. Statistics of existing manual bindings showed that
the number of manual bindings is relatively low compared to the number of archetypes being created.
A possible cause of this may be the modelling approach developed by the openEHR organisation, which
leaves the binding to be completed in a more specific stage in the scenario of developing an EHR
system. It is said that binding can be added in the so called ``openEHR templates'' to fit
particular clinical usage \parencite{leslie2008international}. In the author's view, archetypes as a self-contained general and shared information artefact which consist of clinical meta-data, may have more benefits for binding to terminology than at a more specific level such as openEHR templates. SNOMED-CT bindings that are embedded in an archetype can provide more functionality to clinical users and EHR platform developers.


\item Few tools provide assistance to experts to make decisions of what concepts in SNOMED-CT should be
used in a given archetype. A small number of tools have been developed to help clinical experts to
bind archetypes to SNOMED-CT. However most of them are focusing on particular functionality such as
navigation or searching SNOMED-CT. The author believes that because semantic gaps exist between the
archetypes and SNOMED-CT, tools are helpful but not useful to narrow the semantic difference between
the two information modelling approaches. The author has concluded that a better integration platform could potentially solve the problem.
\end{enumerate}

\subsection{Conceptualisation of Terminological Shadows and a generic framework}


Based on observations of EHR information models and clinical terminologies that have been described in chapter 2 and 3, 
the author conceptualised a mediating resource in chapter 4 that represents the semantic meanings of clinical information in archetypes. 
After reviewing the related work in section 3.3, the author proposed a new approach to facilitate
the integration of EHR information models and clinical terminologies in chapter 4. In this new
approach, the author took inspiration from earlier works by 
\textcite{bisbal2009arch-align} mentioned in section~\ref{sec:shadow-overview} to conceptualise a mediating resource called a `Terminological Shadow' to link the EHR information model meta data with terminological concepts.  The author also created the design for a generic framework that allows the instantiation of terminological shadows and the evaluation of the shadow approach. 
The design detail of the framework has been described in chapter 4 together with a two-step evaluation. 

 
For the purposes of assessment, this framework is deliberately kept straightforward and transparent rather an elaborate one that works with multiple EHR formats and standards. The focus is on showcasing the idea to construct a framework to test EHR-terminology integration.


The conceptual framework provides a minimum architecture that can be used to test future integration methods. It can be extended to benefit more researchers who want to improve the integration of EHR information models and terminologies.
% it has been shown in section..
The framework provides the following main functionalities to allow researchers to:
\begin{enumerate}[(a)]
  \item Import EHR information model constructs and artefacts such as archetypes
  \item Import clinical terminologies such as ICD and SNOMED-CT
  \item Test integration algorithms and methods
  \item Utilise terminological shadows (use them in an application)
\end{enumerate}
The introduction and description of the concept of terminological shadows and the steps 
that were described to instantiate them provide evidence of \textbf{contribution C.2}.
\begin{quote}
\textsf{Contribution C.2 This work has introduced a new approach for instantiating an EHR-terminology
mediating resource using the relationship between models of clinical information called
archetypes on one hand and SNOMED-CT, the most substantial clinical terminological
system on the other.}
\end{quote}
% generalise the approach
% specify what features of the resource make the approach applicable 


\subsection{Instantiation of the conceptual shadow framework and evaluation}


%As mentioned in section~\ref{sec:mapping-nlp}, binding and mapping medical text to clinical concepts from terminology systems often involves algorithmic methods that process the text and the terminology. Many implementations of such methods have their root in the area of Natural Language Processing and Information Retrieval. As noted in the literature review, the application of NLP and information retrieval in the health domain and the idea of linking clinical information with standard terminologies are not a new. Nor is the attempt to achieve automation of the linking process so that there is minimum human intervention. However the task of automatically associating data items in archetypes to concepts in a clinical terminology is a tricky one. Numerous projects were focused on mapping text to large lexicons, examples are projects described in section~\ref{sec:most} and section~\ref{sec:mapping-spain}. 

Following the introduction and explanation of terminological shadows and the subsequent design of
the associated conceptual framework, the shadow framework has been implemented by the author to
reflect the clinical concepts in archetypes. The shadows contain mainly SNOMED-CT concepts that
represent the clinical content of archetypes and context information from the reference model. In
this thesis, a generic framework was first developed, then the author extended a tf-idf based
algorithm to create terminological shadows of archetypes inside the shadow framework. After that the
author assessed the validity of the shadow approach and the performance of the tf-idf based
algorithm in two separate studies. The details of implementation of the shadow framework and the
findings of the studies will now be reviewed.


% NOT TRUE?! 
% It was also shown that shadows can be generated using different binding algorithms to allow more accurate algorithms to be developed independently. Shadows themselves are also useful resources to help understand the archetype-SNOMED relationship and are not only limited to SNOMED-CT and a two-level model based EHR.


%This work has demonstrated a novel approach that provides a framework for experimenting the binding process and a solution to achieve automatic binding, as summarised in the next section. 



In this thesis, the author has explored different approaches that can be used to map archetypes to SNOMED-CT concepts to create shadows. 
%The steps involved in the implementation of the framework and the investigation of the algorithm are as follows:
%\begin{enumerate}[Step 1]
%\item The author has already described the design of the framework in section~\ref{sec:constr_sh}. The focus of the design was not exclusively on the algorithm - the intention was to develop a generic framework into which many algorithms (or combinations of cooperating algorithms and techniques) could be inserted. It was also intended to cope with multiple terminologies and EHR information model meta data formats such as archetypes and HL7 CDA templates.Section~\ref{sec:term-comp}, \ref{sec:binding-comp} and \ref{sec:trav-comp} described the main software components of the framework and later in section~\ref{sec:impl-fw} the implementation was described. The implementation of the framework is considered as a default realisation of the design. This work focused on archetypes as the EHR information model meta data and SNOMED-CT as the major terminology. 
%\item The author adopted a \emph{tf-idf} based search algorithm and performed an initial evaluation of the shadow framework which provides candidate SNOMED-CT concepts that can be bound to archetypes.  The study was described in section~\ref{sec:init-eva} the evaluation process used existing manual bindings as the gold standard to evaluate the accuracy of the algorithm. The result showed that it performed relatively well for a selected set of archetypes.
%\item To eliminate false positives in the study, a more sophisticated evaluation was carried out to test the performance of this binding method as reported in section~\ref{sec:perf-eva}.  The results showed that the algorithm is quite stable on performance. The conclusion was that this method can be adopted to be used on larger data set to evaluate the semantics of archetypes.
%\end{enumerate}
This work has applied the terminological shadow approach to a type of EHR meta-data called archetypes. The study used the NHS archetype repository, which provides a wealth of clinical information that form the basis of an EHR, to generate terminological shadows. The author also intended to promote the adoption of open standards in healthcare by choosing openEHR archetypes.


Chapter 5 has reported on how the terminological shadows have been instantiated by implementing the framework which creates shadows from archetypes. 
It has been noted that the approach of instantiating terminological shadows was based on a
\emph{tf-idf} binding algorithm. The implementation of the generic shadow framework was described in
section~\ref{sec:impl-fw}, while section~\ref{sec:init-eva} provided a description of the initial
evaluation plan that was performed on a set of selected archetypes. \textbf{Contribution C.3} can be derived from the above summary.


% add summaries from paper 1
% change contribs in chapter 1

\begin{quote}
\textsf{Contribution C.3 This work has described an approach for instantiating terminological shadows for the
health domain using the relationship between models of clinical information called archetypes on one
hand and SNOMED-CT, the most substantial clinical terminological system on the other.} 
\end{quote}

As already described in the previous sections, the \emph{tf-idf} algorithm mentioned above was developed to operate within a framework. The implementation of this shadow framework has been described in section~\ref{sec:impl-fw} and the author also described the process of populating the index of the terminology component and also the evaluation process. 

This work has also described how the framework has been used to evaluate the performance of the binding algorithm.
In chapter 5, and particularly in sections~\ref{sec:init-eva} and \ref{sec:perf-eva}, the
author performed evaluations of the framework. The results of the two evaluation experiments
validated a method consisting of a relatively transparent algorithm which has the advantage that it can be
easily incorporated and analysed. The result of the analysis is a contribution to the better
understanding of the performance of a stand alone linking algorithm. It has been shown that a better
linking mechanism can be built based on analysis of standard alone algorithms, without having to
deal with a `black box' approach where the system is too sophisticated to be analysed. More
specifically, \textbf{contribution C.3a} is:
\begin{quote}
\textsf{        
Contribution C.3a A framework has been developed to create and evaluate the efficiency of the resource in terms
of integrating EHR information model meta-data with clinical terminologies.}
\end{quote}


It has been shown towards the end of chapter 5 that the author has investigated proper algorithms for creating
terminological shadows. Although the investigation was not the focus of the work, the author adopted
a \emph{tf-idf} based method as the default shadow construction algorithm. It has been shown in the
thesis that this particular algorithm achieved sub-optimal performance. Therefore
\textbf{contribution C.3b} is:
\begin{quote}
\textsf{    
Contribution C.3b The author has developed an approach to (semi-)automatically create terminological shadows.}
\end{quote}




It follows from \textbf{contribution C.3a} that the framework is an adequate tool to assess the effectiveness of EHR-terminology integration. The system can be used to assess the efficiency of archetype-terminology binding algorithms by comparing the automatically generated terminological shadows against human annotation. The author believes both the terminologies and the EHR information model meta-data are open to implementation that deals with other formats and standards. 


The results in section~\ref{sec:init-eva} have shown that terminological shadows can be used to
calculate performance measures and statistics such as \emph{recall} and \emph{precision}.
The author has evaluated the effectiveness of the terminological shadow approach, using a two stage evaluation process that assesses both the viability and the performance of the binding algorithm. In section~\ref{sec:perf-eva}, the tf-idf based algorithm was assessed. A number of characteristics of the algorithm have been revealed and more interestingly, the finding suggested that this algorithm is useful for matching concepts in archetypes to SNOMED-CT categories (see section~\ref{sec:diss-perfeva}). This result formed the basis of the later studies in chapter 6.


Therefore during the course of implementing the framework to instantiate shadows and investigating
algorithms, \textbf{contribution C.4} has been made.


\begin{quote}
\textsf{    
C.4 Using the framework, this work has evaluated the effectiveness of this new approach that uses
the mediating resource (a terminological shadow), by performing assessment on both the viability and
the performance of the resource.}
\end{quote}


\subsection{Shadow utilisation} 


The author has demonstrated that the shadow approach is a multi-purpose tool that can be used to enhance the archetype modelling approach in several ways. This thesis has demonstrated the applicability of shadows by providing two example applications. The following subsections discuss the implications of two example applications that were reported in the thesis. 
 
\subsubsection{Coverage discovery using shadows} 
The first example application involved the utilisation of shadows to discover clinical concept coverage of an archetype repository. 
The biggest archetype repository was used - all NHS archetypes were used to generate shadows. The resulting shadows have revealed that certain clinical areas have an adequate number of archetypes for usage in their clinical scenario but some have been identified as under-covered.


The result showed an interesting distribution of the archetype data items in SNOMED-CT with respect to clinical care. It is likely that there will be a different distribution among other archetype sets that are developed for purpose other than care. The work has also revealed a number of relationships between archetypes and SNOMED-CT categories, which was a key intention of that aspect of the thesis.


It has been shown that shadows can be used to represent the semantic content of archetypes and they can be used to measure how closely related different archetypes are to each other. It led to the findings that particular sections of SNOMED-CT have a closer relationship with certain classes in the EHR information model of openEHR. 

There are a number of  benefits to this study. A full overview on a repository can guide archetype modellers to balance their creation of archetypes and clinical areas of their interest. The study also reflected the relationship between the Archetype Object model and the SNOMED-CT concept model. Developers of clinical archetypes can be aware of the clinical area that are covered by the existing repositories.
These findings can be further explored to enhance the integration of EHR information models and
clinical terminologies. For instance, new research questions can be established such as `Should
\emph{entry} classes align with the structure of SNOMED-CT base categories?'.  It also provides research material in terms of clinical concept coverage so that the developers of SNOMED-CT can be aware of how the concepts might be used in an EHR such as openEHR. In due course, EHR information modellers may adopt the style of SNOMED-CT to produce a terminology-friendly model.


This example obtained the coverage of clinical concepts in an
archetype repository with respect to a major clinical terminology by generating terminological
shadows.  Shadows were created from a large number of archetypes, to view the coverage of clinical
concepts of this repository by using SNOMED-CT as a metric. The results of the coverage inspired the
discussion mentioned in section \ref{discu} in chapter 6. 






The overview of the clinical coverage of an archetype repository is an interesting and useful
example to showcase the applicability of terminological shadows. An estimation of clinical coverage
is of interests to new developers who are 
adopting existing archetypes or creating new ones.  The clinical coverage investigation estimated an
approximate coverage of an archetype repository for every category in SNOMED-CT. 
Therefore, \textbf{contribution C.5a} can be stated as follows:
\begin{quote}
\textsf{    
C.5a The author has demonstrated the applicability of the shadow approach, by providing an example application that calculates the coverage of clinical concepts in an archetype repository with respect to a major clinical terminology.
}
\end{quote}

The coverage information was interpreted in section \ref{discu}. Although the coverage calculation
is  an approximation, information about all the archetypes in the repository with respect to
SNOMED-CT categories were provided. It can show, for instance how many archetype nodes are related
to each of the SNOMED-CT first and second level categories. The importance of these statistics is
that they help archetype developers to determine how much metadata is available in a given archetype
collection relating to a clinical field. Development decisions can then be made, for instance, to
adopt existing archetypes or create new ones. Therefore,  \textbf{contribution C.5b} can be made as
follows:
\begin{quote}
\textsf{    
C.5b The resulting coverage has revealed notable statistics about the current archetype modelling process.
}
\end{quote}

% discuss J.Allones paper
Coverage information has the potential to provide a wealth of information for future archetype
developers and EHR integration. The results of the coverage can also be interpreted to reveal how
openEHR reference model classes are related to SNOMED-CT categories. For example section
\ref{discu} has
revealed that the reference model classes `Action', `Observation' and `CodePhrase' are clinically
specific. They are likely to be associated with clinical categories `Procedure', `Finding' and
`Qualifier value'. Although more thorough investigation needs to be carried out, the coverage could
help to unearth hidden relationships. Therefore \textbf{contribution C.5c} is: 
% add these discussion to ch 6

\begin{quote}
\textsf{    
C.5c The analysis of the coverage has also revealed useful information about the relationship between openEHR reference model classes and SNOMED-CT categories.
}
\end{quote}



\subsubsection{Archetype comparison by shadows} 


The second example application produces a method of comparing clinical archetypes. Section
\ref{sec:paper3} has demonstrated the process of using terminological shadows as the basis of similarity measurement. The demonstration led to discussion of the extensibility of the shadow approach. It showcases terminological shadows as a mediating resource with potential for application to a variety of use cases. The shadow approach translates the semantic meaning of archetypes to concepts in SNOMED-CT. The “semantic relatedness” of archetypes can be inferred based on the distance between the nodes of the SNOMED-CT network. This particular archetype-comparing application would benefit the developers of archetypes in a number of ways:
\begin{enumerate}[(a)]
  \item  To discover archetypes that have overlapping content and eliminate redundant information
  \item  To compare archetype modelling approaches by finding similar archetypes
  \item  To monitor the growth of archetypes in a particular clinical field, for example find all archetypes related to a type of disease
  \item  To allow users to search archetypes based on semantic meanings
  \item  To be able to build networks of archetypes for further analysis
\end{enumerate}
With future improvement on the shadow creation method and algorithms, more utilities can be built based on terminological shadows.

Specifically, the following claims can be made about this work,

\begin{enumerate}[(a)]
  \item Semantic similarity measurement in graph theory was employed in conjunction with shadows, to enable comparison between archetypes.  Shadows were generated `on the fly' and were used to compare with other archetypes. A semantic similarity measurement was calculated to evaluate how ``close'' the two archetypes are semantically. Also a more detailed view can be generated within archetypes to show their relationship (which data nodes are closest and what is their context). This work can aid the management of archetypes to identify potentially redundant archetypes for merging to improve efficiency.
 
  \item The results also showed that the lowest-common-ancestor method can be used to identify similar nodes among archetypes. The comparison of functionality can also be used in other ways such as concept visualisation and quality control.
\end{enumerate}

The demonstration of the archetype comparing method and the analysis of the results lead to
\textbf{contribution C.6}:
\begin{quote}
\textsf{    
C.6 This work has demonstrated a method of measuring the similarity of clinical content in EHR meta-data such as archetypes by comparing terminological shadows.
}
\end{quote}

\textbf{Contributions C.1 to C.6} have demonstrated the application of the shadow approach to the specific case of recording of health information, a problem domain that has both rich and descriptive information model meta-data and very large and mature terminology in the form of SNOMED-CT. In the author's view, this specific case also demonstrates the general applicability of the shadow approach under similar conditions. 


In other words where a very detailed and large domain-specific information model that requires references to a large and complex external ontology it is possible to apply the shadow approach that has been documented here. 


In such circumstances, this work has demonstrated that a mediating resource can be established to help linking the domain concepts that are both in the information model and the external terminology. To facilitate the integration, semantic binding algorithms can be applied and assessed with this mediating resource.
 




\section{Insights}
 
Electronic health records and clinical terminologies are each supported by large user groups who are not explicitly exposed to the fundamental building blocks of these systems. Often one group of stakeholders is an expert of only one side of the two ``worlds'' - either a clinical user is familiar with the content of the clinical terminologies but not the complicated structure of an EHR, or a technical person who knows the EHR information model very well but not the clinical concepts in a terminology.


The process of creating a framework for the integration of both EHR information models and clinical terminologies as reported in this work, produced experiences that are associating with both user roles. The experience gained can be briefly summarised as follows:
 
\emph{Remark 1:} Sometimes the information models of an electronic health record are designed to
store information from a specific perspective. For instance the EHR for a cancer patient is usually
centred on oncological treatments \parencite{james2001onco}, which organises the patient information by
surgeries, histopathological examinations and so on. Other oncological EHR systems may model
clinical information with respect to treatment such as chemotherapy and laboratory results. Because cancer patients tend to have a higher variety of clinical information than patients with other diseases, a generalised EHR information is not always sufficient to capture all useful clinical data. Information models of the openEHR reference model and the HL7 RIM are examples of general models. The problem with the general models is that they are often balanced to be general and not biased to clinical data that are associated with special disease such as cancers. However, uneven distribution of clinical concepts is legitimate and commonplace and is even a natural characteristic of an expanding terminological system. A clinical classification may contain disproportionate numbers of concepts for different medical conditions. For example, soft tissue cancer can have a substantially higher number of subtypes than the number of subtypes of breast cancer.
Therefore the mapping of terminological resources to EHR needs to take into account the clinical domain context. 
% This is particularly true when the context is not well covered by the archetype. E.g the context (meta information) of a lab test is needed to make a recording meanin
% need to reference paper2 findings about different densities in snomed
 
\emph{Remark 2:}  Despite the different purposes of EHR information model and clinical
terminologies, a generic EHR information model has base concepts that are similar  to base
classifications in a generic terminology such as SNOMED-CT. In the literature review it was observed
that EHR information models such openEHR, EN13606 and HL7 version 3 share the characteristic that
the base classes are relatively general entities.  In chapter 6 section~\ref{sec:snorm-class}
discusses the relationships between base classes for information models and SNOMED-CT. To provide a
specific example from chapter 6, the base entry classes in the openEHR reference model are:
\emph{``Observation'',``Evaluation'',``Action'' and ``Instruction''}.
The most eminently equivalent SNOMED-CT base classes are:
\emph{``Clinical finding'',``Procedure'' and ``Observable entity''}.
According to the openEHR specification the entry class \emph{``Observation''} is used to record raw
information that are considered uninterpreted such as ECG data or blood pressure.
\emph{``Evaluation''} can be used to describe interpreted data such as a report or note produced by
a clinician. \emph{``Action''} and \emph{``Instruction''} are activities that are performed on the
subject of care. These classes are likely to be related to concepts from \emph{``Observable
entity''}, \emph{``Clinical finding''} and \emph{``Procedure''} in SNOMED-CT. Section~\ref{sec:snorm-class} of chapter 6 attempts to quantify some of these correspondences. This represents the emergence of a common underlying modelling approach in the development of general EHR information models and general terminology.


\emph{Remark 3:} From observation of section~\ref{sec:1st-lvl} it can be noted that the density distribution of data items in archetypes that are associated with SNOMED-CT concepts is influenced by the design of the clinical terminology. The well defined and well understood, mature parts in SNOMED-CT are more likely to be used to annotate archetypes. Archetype designers should therefore be aware of the structure and content of the mature parts of the terminology.
 
\emph{Remark 4:} The design intentions of archetypes used in this study are associated with
documenting the healthcare of patients, protocols and procedures in the healthcare site such as
hospitals (both administrative and medical). Other sets of archetypes might also emerge in coming
years that alternatively or additionally focus on population health, clinical research, biomedical
archetypes (future) might have a different landscape of SNOMED-CT concept association densities
\parencite{Dentler2013qlt}. 


\emph{Remark 5:} Clinical terminology such as the content of SNOMED-CT can be used as constraints in archetypes. In contrast to the constraint type available in the Archetype Object Model, SNOMED-CT concepts can be associated with archetype data items and can be used to assign semantic meanings. This could lead to future improvements on the model design in archetypes and potentially other EHR meta data.
 
\emph{Remark 6:} Information models are usually specific to certain system functionalities. A
generic information model is difficult to incorporate within different systems. Large-scale clinical terminologies such as SNOMED-CT  are designed to be universally neutral. They can be used with many different systems as a reference library of medical concepts.
The more general the EHR information model is, the more challenging it becomes to integrate with terminologies. The developers of the two communities need to be aware the work of each other during the development of models.
Neither terminologies, nor EHR information models, are designed to express every aspect of clinical information.


\emph{Remark 7:} The mainstream of health informatics seeks to build systems based on general information models, therefore it can afford certain level of flexibility for integration. But this comes at the cost of reduced level of constraints. The lack of specificity of a general information model of an EHR can be partially compensated by injecting external semantic references such as a specific terminology. Semantic interoperability can be improved if the EHR information model can be easily integrated with terminologies. The integration depends partially on how well-studied are the difference / overlaps between the EHR information model and terminologies.




%A general-purpose EHR information model often employs generic information containers so they can be extended and specialised to express more specific clinical meanings. For example, a generic class from the openEHR reference model can serve as a base for a variety of clinical information. The generic container would often reference a concept or term from a clinical terminology to express unambiguous clinical meaning. However potential conflicts / overlaps may occur between certain data items from the EHR information model and the clinical terminology, which may result in inconsistent clinical information.




\subsection{Limitations}


In this thesis there has been discussion about the limitations of each individual experiment and study.
 
Section 5.3.6 has discussed the limitation of the \emph{tf-idf} based algorithm and the performance of the algorithm. It has been noted that the popular term weighing scheme is designed to solve problems in the field of information retrieval. In this work the weighing scheme has been applied to clinical statement that consist of no more than 10 words. Further optimisation remains to be done since the \emph{tf-idf} algorithm works better for longer documents.


Section 6.1.3.2 has mentioned that the coverage of SNOMED-CT is an approximation based on
terminological shadows which are the results of the linking algorithm. However the algorithm was only intended to showcase the framework and as a result can be made more sophisticated and extended in future work.
 
Section 6.2.5 has discussed the limitation of the technique used in comparing archetypes with
terminological shadows. It has been recognised that the semantic metric that has been used in the experiment can be made more robust. However as an exploratory study the adoption within this work, of a generic method of measuring semantic similarity from the field of network and graph theory seems rational.


In principle, the terminological shadow framework is a generic platform to test out different
EHR-terminology integration. The framework needs to be elaborated in order to produce more sophisticated clinical applications. The thesis only demonstrated two example applications, yet more applications need to be developed to maximised benefit of using terminological shadow.

\subsection{Characteristics of a terminology that help to produce terminological shadows}
This thesis has explored the relationship between EHR information models and clinical terminologies. Based on the experience of the author with using SNOMED-CT to build shadows, it can be suggested that the terminological shadow approach can benefit from terminologies that have the following qualities:  
\begin{itemize}
\item The terminology is a classification and has relationship between concepts such as `is-a'
\item 	The network of concepts is a tree-like structure
\item 	The descriptive information such as preferred terms are widely used and with little ambiguity
\item 	A terminology with certain linguistic features, such as synonyms and stem words 
\end{itemize}
The corpus of archetypes is still maturing, as is the quality of terminological bindings that are shared with the community. Until such time as the quantity and quality of the archetype resources is improved to allow a study that has full statistical relevance, the validation of these considered opinions along with further validation of other aspects of the work is left by the author for future work.


\section{Future work}
 
The terminological shadow approach can be extended in many ways to show its value.
Future graphical applications can use `Shadows' to visualise the coverage of an archetype repository
by generating a `concept cloud'. Mimicking the way that a word cloud may summarise the topics of a
set of documents, such an artefact can enhance the usability of a repository when the users need to search and browse archetypes.
 
Future researchers may also wish to investigate how to derive and use additional contextual
information that can enhance the effectiveness of the Shadow approach for nodes in certain parts of
an archetype. For instance, at the leaf level of an archetype, one is likely to find constraints on
an element of a data type to store clinical data. In many cases this takes the form of a standard
coded medical term or a condition. The most frequently occurring group of codes in the category
\emph{Clinical history and observation findings} are examples of SNOMED-CT codes that would
comfortably map into this part of an archetype. The majority of SNOMED-CT codes are candidates for
this type of mapping. On the other hand, the `verbs' or `link' concepts in the SNOMED-CT model that
connect core concepts and their potential modifiers are called linkage concepts.  The highly
utilised \emph{Attribute} category has the potential to relate closely to archetype organisational
elements according to its coverage information. Another possible context information could include
whether a searched term was a member of the \emph{Person} category, which is associated with
demographic model in the Archetype Object Model.


Future work is planned as follows:
\begin{description}
  \item[Utilise meta information from EHR and SNOMED-CT] The author sees a
	  potential for terminological `Shadows' of archetypes to be used to
	  aid better integration between the EHR information model and clinical
	  terminologies such as SNOMED-CT. Therefore
	  in the future the author will incorporate investigation of the
	  relationship between the meta information of archetype terms and the
	  mapped equivalent SNOMED-CT concepts. This meta information may
	  include the data types of the particular EHR information model which
	  the archetype is constraining, the intended applicability of
	  SNOMED-CT concepts in its concept model. 
	  It is intended to improve
	  the mapping of appropriate SNOMED-CT concepts according to the
	  context of the archetype term. 

For example, the following rules can be adopted to enhance the accuracy of terminological shadows: 
\begin{itemize}
	\item Reference model attributes such as `events' could restrict 
	  the search on SNOMED-CT concepts to procedure or observation related categories. 
	\item The potential mapping between RM classes and SNOMED-CT categories mentioned in section~\ref{sec:snorm-class}.
	\item The data types of RM leaf nodes and their possible corresponding SNOMED-CT categories, 
		for example a numeric measurement is likely to be associated with Observation.
\end{itemize}



\item[Extend the shadow comparison function] The result of this work can provide a better automated means to analyse archetypes that are
created in different development backgrounds that may contain overlapping information for the same clinical setting that is presented in different ways.
The archetype comparison function should be improved by utilising the internal structure of
archetypes and SNOMED-CT. 

For example, the following rules can be adopted to enhance archetype comparison: 
\begin{itemize}
	\item Reference model attributes such as `events' could provide extra context information when comparing SNOMED-CT concepts.
	\item In \textcite{bisbal2009arch-align} anchor nodes are used for aligning archetype nodes but it could also be 
		applied to enhance the terminological shadow approach.
	\item Certain categories of SNOMED-CT should be avoided or handled specially such as \emph{Attribute} and \emph{Device}. 
		Improvement of pairing up nodes in archetypes can be made by modifying the algorithm to look at their parent 
		and child nodes. For example ELEMENT that contains a text string should not pair with an ELEMENT with a numeric value.
\end{itemize}




\item[Refine the shadow creation algorithm] Terminological shadows as an approximate representation of the semantic content of archetypes or
other clinical meta-data resources, express the clinical concepts in archetypes with SNOMED-CT
concepts. With further elaboration and refined terminological shadow creation algorithms, it has the
potential to be used as a tool to discover the relation between archetypes/SNOMED-CT and manage
archetypes semantically (which has partially been adopted by \textcite{Allones2013sno}). 
Further research will also utilise `Shadows' to compare individual archetypes to identify similarity among heterogeneous archetype designs. Archetypes are compared by matching their `Shadows' and analysing the SNOMED-CT concepts in the normalised mode \parencite{snonorm}.


Good archetype modelling practice could potentially improve terminological shadow results, for instance, the more standardised descriptions a well designed archetype contains, the more accurate resulting SNOMED-CT concepts will be stored in the terminological shadow. The terminological shadow approach could also impact on archetype modelling to provide guidance by suggesting clinically relevant information and generally help to provide assistance in finding appropriate bindings. In addition, an archetype modeller can view the pre-calculated terminological shadows of incomplete archetypes, which may suggest additional archetype related structures. 

% ADD templates (openehr & HL7) description
\item[HL7 CDA shadows] A major extension of this work would incorporate HL7 CDA so that terminological shadows cover the
space of this widely known family of information standards. This will broaden the applicability of
terminological shadow approach within the healthcare domain. The author intends to carry out the
study to apply the shadow approach outside of the healthcare domain. As mentioned in
\textcite{nadkarni1999eavcr}, a generic EAV/CR based information model can also potentially benefit from the equivalent of terminological shadows in domains outside of healthcare. The general contribution of this work is that it is possible to re-apply the shadow approach in suitable circumstances in other domains. It is the author's intention to re-apply the shadow approach in another case where an EAV/CR or a similar multi-level model requires references to a complementary complex ontology.   
\end{description}

\section{Conclusions}
\label{final_con} 
This work has sought the enhancement of integrating EHR information models with clinical
terminologies. In the course of exploring and investigating the research question, which aims to
achieve improved semantic interoperability by facilitating EHR-terminology integration, a number of contributions have been made to enrich the research area.





The core contribution of the project is the investigation of the co-relation between
two essential components of Electronic Health Records whose goals are to achieve semantic
interoperability in a health environment composed of heterogeneous clinical systems. In a nutshell, this work combines terminological resource such as SNOMED-CT with EHR meta data such as archetypes to provide semantic enhancement for the two-level EHR model.


The work has brought forward a research question that has been in the centre of semantic interoperability in electronic health information exchange, which is concerned with the integration of EHR information models with clinical terminologies. The research question is how to improve semantic interoperability by facilitating better integration of EHR information models and clinical terminologies.


The author has reviewed contemporary EHR information models, particularly the specifications of the openEHR reference model and archetype object model, and a sophisticated clinical terminology to establish a use case scenario to answer the above research question. Having established this use case scenario, the author hypothesized that a mediating resource can facilitate the integration between an EHR information model and a clinical terminology.


The next step in this study was to  investigate current integration mechanisms for linking the terminological resource with openEHR archetypes; the author reviewed a series of related technologies and projects to achieve this integration.  The author then conceptualised a mediating resource called ``Terminological Shadows'', as a representation of the overlapped health information between the openEHR archetypes and the clinical terminology, SNOMED-CT. 


Not only the terminological shadows were implemented by the author, a generic framework has been designed and built in order to test the efficiency of shadows. In a first step to test the validity of  the shadow approach, the author has verified a small set of shadows by matching the mediating resource against the annotations made by human experts. The results of this investigation have indicated that although the algorithm that links SNOMED-CT with archetypes needs elaboration, the framework and the shadow approach can be successfully combined to facilitate the development of more advanced integration between EHR information models and terminologies.  In the next evaluation, more sample archetypes have been applied and the results have shown that the shadow framework is a good tool to analyse performance of algorithms that promote better linkage of archetypes and SNOMED-CT concepts. In the meantime, the performance analysis of the algorithm adopted in the default implementation of the framework has shown that a \emph{tf-idf} based algorithm can be used to approximate the SNOMED-CT categories of clinical concepts in archetypes.


The author further utilised the framework to provide two example applications of the shadow approach. Firstly, based on the conclusion that the default shadow can approximate SNOMED-CT categories, the author created shadows for a whole archetype repository to reveal the SNOMED-CT coverage. The results have shown that certain SNOMED-CT categories are well-covered while some are not. The results also revealed interesting relationships between the SNOMED-CT categories and the openEHR reference model classes. It has been shown that the coverage information could benefit both the archetype and SNOMED-CT developers. Secondly the author extended the shadow framework to build an archetype comparison tool. A semantic similarity measure has been adopted for comparing terminological shadows. As a result, the application has shown that the relatedness of archetypes can be obtained by comparing shadows that are derived from them. The archetype comparison tool can benefit archetype management and even in the future enable semantic searching of the clinical content in archetypes. 


To summarise, this thesis complements the two-level model EHR approach by incorporating a
well-established clinical terminology system SNOMED-CT.  The author studied the correspondences
between SNOMED-CT and archetypes, and investigated the applicability of integrating the two. This
thesis has introduced and described the creation of a process to generate a semantic artefact that
the author has called a ``Terminological Shadow''.  The author has verified the approach by carrying
out a set of evaluations and then demonstrated the usefulness of ``Terminological Shadows''.
The thesis has demonstrated that ``Terminological Shadows'' benefit the existing technologies that manage and maintain archetypes.




%todo -- future work IMPORTANT!!! Contribution: the shadow approach can be used for machine learning purpose as
%a whole new method for creating intelligent term binding, as a statistic resource and training
%resource, combined with human annotation, to provide a NLP-like training service for a EHR model
%(e.g openEHR training data, HL7 training data etc)
